arXiv: 1501.03379v7 [stat.CO] 1 Apr 2016 


Control Functionals for Quasi-Monte Carlo Integration 


Chris. J. Oates 

University of Technology Sydney and 


Abstract 

Quasi-Monte Carlo (QMC) methods are be¬ 
ing adopted in statistical applications due to 
the increasingly challenging nature of numer¬ 
ical integrals that are now routinely encoun¬ 
tered. For integrands with d-dimensions and 
derivatives of order a, an optimal QMC rule 
converges at a best-possible rate 
However, in applications the value of a can 
be unknown and/or a rate-optimal QMC rule 
can be unavailable. Standard practice is to 
employ Oi-optimal QMC where the lower 
bound < a is known, but in general this 
does not exploit the full power of QMC. One 
solution is to trade-off numerical integration 
with functional approximation. This strat¬ 
egy is explored herein and shown to be well- 
suited to modern statistical computation. A 
challenging application to robotic arm data 
demonstrates a substantial variance reduc¬ 
tion in predictions for mechanical torques. 

1 Introduction 

Consider a Lebesgue-integrable test function / : A —>■ 
K defined on a bounded measurable subspace A C 
{d € N) with square integrable derivatives of order a > 
0 in each variable. Our focus is numerical computation 
of the integral /[/] := f(x)dx. The Quasi-Monte 
Carlo (QMC) approach is based on an approximation 

1 ^ 
n—1 

where the (possibly random) design points x^'^ = 
{x^,... ,x^} C A have low discrepancy; that is, the 
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points are ‘well-spaced’ in a precise sense defined be¬ 
low. This contrasts with the Monte Carlo (MC) ap¬ 
proach whereby the design points are sampled inde¬ 
pendently from a uniform distribution over A. MC 
integration achieves a root mean square error (RMSE) 
convergence rate of whereas QMC integra¬ 

tion can in principle achieve a rate 0(iV““/'^) on spe¬ 
cific geometric sequences [H]. It is known 

that this rate is best-possible m and explicit algo¬ 
rithms to generate design points that attain this rate 
are now available for many (but not all) values of a [5] . 
Challenging integration problems are common in con¬ 
temporary statistics, for example when computing ex¬ 
pectations, marginal probability densities or normalis¬ 
ing constants, and QMC methods are therefore gaining 
importance in statistical applications [mini Eg. 

Contrary to the above theoretical considerations, rate- 
optimal QMC is often not employed in practice. This 
is mainly due to three reasons; either (Rl) the smooth¬ 
ness parameter a is unknown, (R2) there does not cur¬ 
rently exist an explicit QMC rule that is rate-optimal 
for functions of smoothness a, or (R3) it is simply 
more convenient to employ a basic QMC rule based 
on a weaker smoothness assumption < a, as imple¬ 
mented in standard software. In each situation there 
is a gap between theory and practice that, as we show 
in this paper, can be bridged using functional approx¬ 
imation. 

Previous work on variance reduction techniques for 
QMC includes [1], who considered modified impor¬ 
tance sampling strategies, and |14] . who considered 
constructing control variates for QMC. Neither ap¬ 
proach improved the asymptotic error rate, though in 
some cases the QMC error was reduced by a constant 
factor. Interestingly, |T4j reports some quite negative 
results for control variate strategies in this setting, be¬ 
cause the objective being minimised by QMC is not 
equivalent to the MC variance that is minimised by 
control variates. |33| demonstrates variance reductions 
in QMC are possible using additive approximations, 
though again the asymptotics were unchanged. 

This paper studies a general approach to variance re- 
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duction for QMC rules, building on kernel methods 
and recent work in the Monte Carlo setting due to 
[221 ED- The mathematics that underpins our work 
comes from the functional approximation literature. 
This takes the form of a ‘control functional’ t/j ■ ^ ^ 

that satisfies (i) ijj integrates to zero, (ii) f — 'tp is more 
amenable to QMC methods than /, in a precise sense. 
The general approach that we explore is to replace the 
integrand f hy f — tp and target the QMC objective di¬ 
rectly. This can lead to accelerated asymptotics. The 
main contribution of this paper is to explore this strat¬ 
egy in the settings (Rl-3) above. Theoretical analysis 
of convergence rates is provided, along with empirical 
results and a challenging application to robotics. We 
begin by presenting some background on QMC the¬ 
ory below, before describing the methodology in more 
detail. 

2 Background 

QMC is naturally studied in reproducing kernel 
Hilbert spaces (RKHS; i)- Below we draw connec¬ 
tions with kernel methods, that are themselves natu¬ 
rally studied in RKHS. 

Notation. We work in a Hilbert space H, consist¬ 
ing of measurable functions / : A" —>■ K. For simplic¬ 
ity of presentation we assume H includes the constant 
functions. We follow the mainstream QMC literature 
by taking X = [0,1]“*, equipped with the Euclidean 
norm ||a;|| := Denote the scalar prod¬ 

uct and norm on H by {■,-)h and || • \\h respectively. 
Suppose further that iJ is a RKHS with kernel K : 
[0, lY X [0,1]“* —>■ K; that is, K satisfies (i) K{-,x) € H 
for all X G [0,1]“* and (ii) f{x) = {f,K{-,x))H for all 
f G H and all x G [0,1]“*. K is assumed to be non¬ 
trivial, i.e. K Y 0. 

Quadrature Error Analysis. The quadrature 
methods that we focus on aim to minimise the ‘worst 
case’ integration error which, for design points x^--^ 
and Hilbert space If, is defined to be 

eff(x^'^) := sup \Q[f;x^'^]-I[f]\ (1) 

ll/llir<l 

where the supremum is taken over all test functions 
/ belonging to the unit ball in H. It follows from 
linearity that, for any function f G H, the integration 
error obeys 

\Q[f;x^-^] - I[f]\ < eH{x'^''^)\\f\\H- ( 2 ) 

The worst case error enix^'^) is the usual target 
of QMC innovation, with chosen to (approxi¬ 

mately, asymptotically) minimise enix^'-^) [8]. Note 
that Eqn. [l] is also the ‘maximum mean discrepancy’ 


(MMD), as studied extensively in the kernel methods 
literature HEO]- 

Quadrature is naturally studied in RKHS because 
there exists a closed-form expression for the worst case 
error in terms of the kernel K, which facilitates the 
principled selection of design points |S]: 


eH{x^'-^r = 


/ / K{x,y)dxdy 

J afo.iF 


'[ 0 . 1 ] 

N 


--T 

N ^ 


N 


K{x'^,y)dy 




(3) 


m,n—l 


The mainstream QMC literature supposes H is a 
Sobolev space of known order a (defined below). In 
this setting, 0{N~°‘/Y is the best-possible rate for the 
worst case error when x^'^ are chosen deterministi¬ 
cally and is the best-possible RMSE 

when are allowed to be random [T9j. We will re¬ 
fer to QMC rules that achieve these optimal rates as 
‘a-QMC rules’. 

This paper focuses on improving performance in the 
situation where a (sub-optimal) az,-QMC rule is used 
to integrate a test function of smoothness a > aL- For 
reasons (Rl-3), this scenario is commonly encountered 
in statistical applications. In contrast to QMC |5] (and 
kernel methods that aim to minimise the MMD 0), 
the rate constant ||/||_f/ is the primary target of our 
methodology below. 


3 Methodology 

Control Functionals for QMC. The approach that 
we pursue in this paper aims to construct a Lebesgue- 
integrable functional tp : [0,1]“' —)■ K that satisfies 

I[Y] = 0. (4) 

When X has the interpretation of a random variable, 
'ipi.x) is classically known as a ‘control variate’ |14) . 
When Ip itself is estimated, we follow [22] and refer to 
the entire mapping i/; as a ‘control functional’ (CF). In 
the CF approach to estimation, the test function / is 
replaced hy f — tp] h is hoped that the latter is more 
amenable to numerical integration. Clearly I[f — pj] = 
/[/]. In this paper we construct a CF ipN based on 
a tractable approximation /jv to /. (The dependence 
on N will be explained below.) It is required that the 
integral /[/tv] is available in closed-form. We then set 

IpNix) = fN{x) - /[/tv] (5) 

SO that ipN satisfies Eqn. [^ For this to make sense 
mathematically, it must be the case that f^GH and 
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this informs our method of approximation (the con¬ 
stant function with value /[/at] belongs to H by as¬ 
sumption). Intuitively, a good CF will provide a 
close approximation to fluctuations of the test function 
/, so that the functional difference / — ipN become 
increasingly ‘flat’ and thus more amenable to QMC 
methods. More precisely, motivated by Eqn. [^we aim 
to construct a CF such that ||/ —< ||/||i/- This 
connection with functional approximation offers the 
possibility to leverage kernel methods for these prob¬ 
lems, see e.g. imisi]. 

Control Functional Error Analysis. Consider par¬ 
titioning x^'-^ into two sets and where 

1 < M < iV and M/N —>• c G (0,1) as A —>• oo. 
The first set , possibly non-random, will be used 
in a preliminary step to construct an approximation 
to /. Then the second set possi¬ 

bly random, is used to evaluate the ‘CF estimator’ 

= Q[f- 

( 6 ) 

We remark that if the points v" are ran¬ 
dom and marginally distributed as C/([0,1]'^) then 
will be an unbiased estimator for 
/[/]. Error analysis for the CF estimator is based on 
the following: 

Theorem 1. Given f,fM € H, we have 

\E[f-u^--^,v^+^--^]-I[f]\ 

Proof. Since /, /m & H we have that / — /m G H. 
The result then follows by applying the fundamental 
inequality from Eqn. to the function / — /m and 
using linearity of the integral operator /. □ 

Thus the CF methodology produces an estimator 
E[f; u^'-^, yM+i:N-^ asymptotically zero error 

relative to standard QMC estimators, providing that 
it is possible to construct an approximation /m to / in 
such a way that ||/ — t 0 as M —>■ oo. 

The next sections establish convergence rates for func¬ 
tional approximation using kernel methods. 

Sobolev Spaces. To achieve consistent approxima¬ 
tion 11/ — furWH —t 0 it is necessary to impose regu¬ 
larity conditions on H. Sobolev spaces are a general 
setting in which to formulate such regularity assump¬ 
tions; our main reference here is m- Firstly suppose 
that k G No, k > d/2 and f < p < oo. For a multi¬ 
index a G Nq we write |a| = oi -I- • • • -I- Od- Define the 


‘p-Sobolev space of order fc’ to be 

:= {/ : [0, f]"^ ^ K I D“/ exists and 

D°‘f G Tp([0, l]‘’*),Va G Nq with |a| < k}. 

Here D°'f denotes the weak (or ‘distributional’) 
derivative of /; the reader is referred to the above 
reference for details. Clearly W^'P is a vector space 
over M when addition and (scalar) multiplication are 
defined point-wise. For the special case p = 2 we equip 
W^’’^ with the inner product 

(/,5)fc:= E I[D^fD‘^9] 

aen'/.\a\<k 

and denote this inner-product space := 

(W^’2, (•, •)fe). Defined in this way, is a Hilbert 
space of functions whose (weak) derivatives exist up 
to order k. Moreover El^ can be made into a RKHS 
with an appropriate choice of kernel (see below). Our 
results below apply also to Sobolev spaces with non¬ 
integer k, but this construction is more technical and 
we refer the reader to m for details. 

Approximation in Sobolev Spaces. Our assump¬ 
tions are naturally stated using Sobolev spaces: Given 
two Hilbert spaces H, H', defined on the same ele¬ 
ment set, with norms || • \\h, || • \\h', we say that El 
and E[' are ‘norm-equivalent’, written E[ = E[', when¬ 
ever there exist positive constants ci, C 2 such that 
cill/lliT < ll/llff' <C 2 ||/||j^ for all/GH. 

Assumption E. El = where ul > d/2. 

Assumption 2: f G H°‘ where a> a^. 

Assumption 1 is a technical requirement to ensure the 
space H (where QMC is performed) admits consistent 
functional approximation. Assumption 2 ensures that 
the test function / is ‘smooth enough’ for a^-QMC 
methods to converge at the ai,-rate. This follows from 
the fact that Sobolev spaces are nested, so that / G 
i/a / G H°‘^. 

For consistent approximation of / it is necessary to 
base our approximation /m in a space iJ* of functions 
that are ‘at least as smooth’ as /: 

Assumption 3: iJ* = where ajj > a. 

It follows again from the nested property that /m G 
and thus the functional difference / — /m exists 
in iJ“^. The Sobolev spaces can be characterised 
as RKHS via an appropriate reproducing kernel A*, 
such as the well-known Matern kernel. 

Finally an approximation /m to / is constructed based 
on the points as follows: 

M 

fM{x\u^'-^) := E finK^{x,u^) 

n—1 


( 8 ) 
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where the weights /3„ € M are defined as the solution 
to the linear system of interpolation equations 

= /(«"), (9) 

It is well-known that Eqn. is the unique minimiser 
of the i7*-norm under all functions in that satisfy 
the linear system in Eqn. |27] . In practice it may 
be necessary to regularise the linear system in order to 
facilitate inversion, but we do not go into details here, 
see e.g. P7] . 

We note that /[/m] will not have a closed-form ex¬ 
pression when the Matern kernel is employed and for 
this technical reason we instead employ tensor prod¬ 
ucts of polynomial kernels (these give rise to Sobolev 
spaces of mixed dominating smoothness - full details 
are provided at the end of this section). 

Theory: Deterministic Case. We begin by con¬ 
sidering the case where the design points are 

chosen deterministically. Define the ‘fill distance’ 

h{u^''^) := sup min 11® — m"II, 
the ‘separation radius’ 

:= ^niinllit^ - m'=|| 

Z j^k 

and the ‘mesh ratio’ := h(u^'^)/q{u^'^). 

The set is called ‘quasi-uniform’ if p(u^'^) —)■ 1 

as M —>• oo. 

Theorem 2. Under Assumptions 1-3 the CF estima¬ 
tor has error bounded by 

I[f]\ 

< CeH‘>L ||/||^/, 

where C > 0 is a constant that depends on a, aL and 
ajj but not on f, and . 

Proof. From [27] (Theorem 7.8) we have that the ker¬ 
nel estimator in Eqn. is consistent for the non- 
parametric regression problem at a rate 

||/-/M(-;«'^"'')||i/<^r 

where C depends only on a^a^^au. Combining this 
with Eqn. [^completes the proof. □ 

For quasi-uniform , there is no asymptotic 

penalty from employing a kernel if* that imposes ‘too 
much smoothness’ on the approximation /m, with 
p —1. In this case h{u^-^) = and, since M 

and N are proportional, h{u^-^) = How¬ 

ever the rate constant C will increase when too much 


smoothness is assumed so that, as a rule of thumb, we 
should try to select au as close as possible to a. Our 
main result is stated below: 

Corollary 1. When u^'-^ is quasi-uniform, CFs ac¬ 
celerate ar-QMC by a factor 0 {N~^°'~’^r}/dy 

Remark: The improvement due to CFs appears to be 
mainly limited to low-dimensional integrals {d small), 
but in fact CFs can in principle be extended to high¬ 
dimensional integrals under additional tractability as¬ 
sumptions, as discussed in Sec. 

Remark: Optimising the bound in Theorem enables 
us to obtain the optimal scaling 

M a — ar 

iV ^ ^ ’ 

see the Supplement for full details. 

The overall convergence rate of the CF estimator de¬ 
pends on how the design points are gener¬ 

ated. For this there are many QMC methodologies 
available, each leading to different convergence rates 
for the worst case error ch^l see |7] for a 

recent survey of some of these approaches. Of partic¬ 
ular interest in statistical applications is the case of 
random design points which we discuss below. 

Theory: Randomised Case. Modern QMC meth¬ 
ods begin with a deterministic set/sequence of design 
points (e.g. a Halton sequence or a Sobol sequence), 
then apply a random transformation leading to a low 
discrepancy set with high probability. Below we con¬ 
sider three types of randomisation; shifting, folding 
and scrambling. 

Shifting: In ‘random shift’ QMC the design points 
yM+i-.N g^j-g translated by a common uniform random 
vector A G [0,1]*^, so that v" i—>■ v" -|- A for each 
n = M -\- 1,. ■. ,N. For convenience we write this 
‘shifted’ set as -f A. Applying Theorem to 

vM+^-N _|_ ^ then marginalising over A G [0,1]'^ 
produces a RMSE bound for the CF estimator: 

Corollary 2. Under Assumptions 1-3 the random 
shift CF estimator has error bounded by 

^E\E[f; + a] - /[/]P 

where 

(ef.^ := [ CH^r. + AfdA 

“'[0,1]'^ 

and C > 0 is a constant that does not depend on f, 
vM+i-N or u^'-^. 
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For quasi-uniform , CFs accelerate random shift 
q;l-QMC by a factor (compare against 

Sec. 5.2 of [7]). 

Folding: A shifted and ‘folded’ QMC rule takes the 
form 

1 ^ 

Q,{f-z^-^ + A) ^/(b(.2" + A)) 

n—1 

where b is the ‘baker’s transformation’, given by 
bi{t) = 1 — |2ti — 1|. This transformation reduces error 
rates; for example, for / S S'i 72 ([ 0 , 1 ]'^) (defined be¬ 
low), folding and shifting a uniform lattice z^'-^ leads 
to a RMSE 0(A^“^+'^) that is smaller than the RMSE 
0(iV“i+'‘) for a shifted lattice (p. 59 of [7]). The CF 
estimator here is 

+ A] 

:= /[/m(-; + Qb[f - /m(-; -t A], 

For convenience we denote the shifted and folded de¬ 
sign points by 6(i;^+^'^-|-A). Applying Theoremj^to 
5(.yM-i-i:Af-|- A) and then marginalising over A S [0, l]'^ 
produces: 

Corollary 3. Under Assumptions 1-3 the shifted and 
folded CF estimator has error bounded by 

^JE\E^,[f;u^■■^,v^+^■■^ + A] - /[/]|2 

where 

(e^i ■-= [ en^, + A)fdA 

J[o,iU 

and C > 0 is a constant independent of f, 
andu^--^. 

Again, for quasi-uniform , CFs accelerate shifted 
and folded ol-QMC by a factor 0 {N~^°‘~°‘r)/d'j (com¬ 
pare against Sec. 5.9 of H)- 

Scrambling: An explicit a-QMC rule that applies for 
all integer values of a was recently discovered by [6]. 
For simplicity focussing on d = 1, these random design 
points achieve a-rates and, moreover, the RMSE is 
controlled by a norm of the form 11/11//°. When a 
is known and is an integer, one may achieve optimal 
rates and CFs provide no rate improvement. However, 
when a ^ N, CFs can be used to transform these sub- 
optimal integrators into optimal integrators. 

Choice of Kernel: The QMC-I-CF methodology has 
some flexibility in terms of the choice of kernel AT* 
that is used to construct the approximation /m- Our 
main requirements here are: (i) AT* imposes ‘enough 


smoothness’ on /m in order to be able to faithfully 
approximate / (Assumption 3). Moreover, AT, should 
be tunable to achieve a pre-specified minimum level 
of smoothness. Below we make an explicit connection 
between AT* and the order of the associated ‘native’ 
Sobolev space that will allow us to satisfy this require¬ 
ment. (ii) The functions Ar*(-,y) can be integrated 
analytically, so that /[/m] is available in closed form. 
This second requirement leads us to consider tensor 
products of Sobolev spaces, as described below. 

To construct analytically integrable functional approx¬ 
imations we consider kernels that are given by poly¬ 
nomials. Wendland’s compactly supported functions 
[33] are defined via the recursion 

Fd,k — [7’[rf/2j-t-fc+l] ? 

the base function (pi{r) = (1 — r)^ with := 
max{0,a;}, and the integral operator 

POO 

I[ip\{r) = / tip(t)dt 
J r 

(r > 0), so that 

(1 - rY^^pd,k{r), r e [0,1] 

0, r > 1 

where (. = \d/2\ -\- k -\- 1 and pd,k is a polynomial of 
degree k (see e.g. p.87 of [3] for explicit formulae). 
Then the kernel Kt,{x,y) = (pd,ki\\x — y||) has native 
space Hdji+k+xli the restriction d > 3 is in 

principle required for the special case k = 0) (see e.g. 
p.109 of 0). With this kernel we can therefore guar¬ 
antee a minimum level of smoothness. By rescaling, 
the kernel’s support can be changed from the unit ball 
(as above) to balls of smaller radius. This in turn en¬ 
forces sparsity on the system of interpolation equations 
that are the basis of the CF estimator and reduces the 
computational cost of inverting this linear system. 

Wendland’s kernel cannot be integrated analytically in 
d > 2 dimensions, violating requirement (ii). However 
we can exploit recent work by |29| that shows the d- 
dimensional tensor product space i7^([0,1]) 0 ■ • • 0 
i7^([0,1]) is norm-equivalent to SFI^ = 1]*^), 

the Sobolev space with dominating mixed smoothness: 

:= {/ : [0, ^ K I £)“/ exists and 

D°'f e Ap([0, l]‘’*),Va G Nq with Oi < k}. 

(The distinction with i7''([0, l]'^) is that the multi¬ 
index a is now constrained component-wise, Oi < k, 
rather than |a| < k.) In particular S'i7^([0,1]'^) C 
i7^([0, l]'^) so that functions in SH^ are at least as 
smooth as functions in We therefore propose to 
employ the product kernel 

d 

Ki^\x,y) = Y[‘fi,k{\^i-y^\) (10) 

2=1 
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whose native space is SH^+^. The integral 



Ki^'^ {x, y)dx 


of tensor products of Wendland functions in Eqn. 10 


can now be integrated analytically. This approach pro¬ 
vides a convenient mechanism to control the degree of 
smoothness that we impose on the approximation Jm- 


4 Experimental Results 

Our methodology provides a variance reduction tech¬ 
nique for QMC that is able to accelerate convergence 
rates, yet is also practical. The first numerical study 
below is a ‘proof-of-principle’ designed to validate this 
specific claim in the empirical setting. 

Simulation Study: For objective assessment we ex¬ 
ploited the test package proposed by m- This pack¬ 
age defines 6 function families, each of them charac¬ 
terized by some peculiarity, such as oscillation, dis¬ 
continuity or corner peaks, with the property that 
their exact integrals are available. The ‘discontinu¬ 
ous’ Genz function provides an example where smooth¬ 
ness assumptions on the test function are violated. 
We used the MATLAB implementation of El that 
is freely available at http://people.sc.fsu.edu/ 
~jburkardt/m_src/testpack/testpack.html, 

In the experiments below, we focus on the two QMC 
rules that are most widely used in practice. In the 
first experiment, the random QMC point set 
was generated by truncating the Halton sequence, 
scrambling the digits of the resulting points using the 
reverse-radix algorithm [16] and applying a uniform 
random shift. This QMC rule achieves the = I rate 
on the subsequence AQ = 2" when the test function 
has mixed partial derivatives of first order. To ensure 
that these QMC rules were implemented faithfully, we 
restricted attention to the case where M = N/2 so 
that N — M was always a power of two. The train¬ 
ing points u^'-^ were taken to be d-dimensional square 
lattices in all experiments. 

We considered the 6 Genz functions in d = 1,2, 3,4 
dimensions. The performance of QMC with and with¬ 
out CFs was compared, in each case ensuring that the 
total number of evaluations of the integrand / was 
equal for all methods. For CFs, the tensor-product 
Wendland kernel with k = 1 was employed (i.e. ap¬ 
proximation with functions /m G so au = 2). 
Results are presented in Fig. (For clarity we chose 
not present results for MC, since these were inferior to 
QMC methods in all cases considered.) For the first 5 
Genz functions it holds that / S id“ with a = 2 and 
theory (for the random case) guarantees an accelera¬ 
tion of this is borne out in experimental 


results. In the 6th, discontinuous case the QMC-I-CF 
method does not out-perform QMC (at least in di¬ 
mension d > 1), as the functional approximation Jm 
is poor due to violation of our continuity assump¬ 
tion. In all cases the performance of QMC-I-CF ap¬ 
proaches that of QMC as the dimension d increased. In 
higher dimensions (d > 5, not shown) the QMC-I-CF 
and QMC estimators demonstrated effectively identi¬ 
cal performance, in line with theory. 

The experiments were then repeated with rougher 
{k = 0) and smoother {k = 2) regression kernels. Re¬ 
sults in the Supplement (Figs. S3-8) demonstrated a 
slight improvement in the performance of QMC-I-CF 
when fc = 2, in line with theory, though generally es¬ 
timates were robust to the choice of regression kernel. 
To further assess the generality of these conclusions, 
further experiments were performed using a different 
QMC rule (truncated Sobol sequence with scrambling 
due to [H]). Results in the Supplement showed that 
the same conclusions can be drawn in each case. Taken 
together, these results demonstrate that CFs can ac¬ 
celerate QMC, at least in low-dimensional settings, 
and thus complete our ‘proof-of-principle’. MATLAB 
code to reproduce these results is provided. 

Application to Robot Arm Data: To demonstrate 
the benefits of our methodology we consider the prob¬ 
lem of estimating the inverse dynamics of a seven 
degrees-of-freedom robot arm. The task, as described 
in [25], is to map from a 21-dimensional input space 
(7 positions, 7 velocities, 7 accelerations) to the cor¬ 
responding 7 joint torques. Following [25] we present 
results below on just one of the mappings, from the 21 
input variables to the first of the seven torques. The 
dataset consists of 48,933 input-output pairs, of which 
44,484 were used as a training set and the remaining 
4,449 were used as a test set. The inputs were linearly 
rescaled to have mean zero and unit variance on the 
training set. The outputs were centred to have mean 
zero on the training set. 

We consider a hierarchical model based on 21- 
dimensional Gaussian process (CP) regression. De¬ 
note by 1/ G K a measured response variable at 
state Zi e assumed to satisfy Yi = g{zi) + 

Ci where ^ A(0, cr^) are independent for i = 
I,...,n and cr > 0 will be assumed known. In or¬ 
der to use training data to make predic¬ 

tions regarding an unseen test point 2 :*, we place a 
CP prior g ^ QV{0,c{z,z';9)) where c{z,z'-,6) = 
01 exp(—— z'W^). Here 0 = (0i,02) are hyper¬ 
parameters that control how training samples are 
used to predict the response at a new test point. 
A fully-Bayesian treatment aims to marginalise over 
these hyper-parameters and we assign independent pri¬ 
ors 01 ~ r(a, /3), 02 ~ r(7j<5) in the shape/scale 
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Figure 1: Simulation study (Genz functions): Each panel represents one test function. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, 
□ represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars 
denote standard error of the replicate mean. QMC points were generated from a shifted and scrambled Halton 
sequence. A Wendland regression kernel was used with k = 1. 


parametrisation, which we write jointly as 7r(0). Here 
a = 0.1, a = /3 = 7 = i5 = 2. 

To predict the value of the response corresponding 
to an unseen state vector 2 ;*, our estimator will be the 
Bayesian posterior mean 

Y,:=E[Y,\y] = j E[Y,\y,e]Tr{e)de, (11) 

where we implicitly condition on the covariates 
Zi, ..., Zn, z^. Phrasing in terms of our earlier nota¬ 
tion, the test function is 

f{x) = E[y*|y,n“^(a:)] = -f cr^J„xn)“^y 

where H is the c.d.f for tt, {Cn)i,j = c{zi,Zj-,9) and 
= c{zt,, Zj;9). Each evaluation of the in¬ 
tegrand f{x) requires 0{n^) operations due to the 
matrix inversion and this entails a prohibitive level 
of computation. A partial solution is provided by a 
‘subset of regressors’ approximation 

f{x)K,Cif^n'{Cn',nCn,n'~\'0''^Cn') ^Cn',ny ( 12 ) 

where n' < n denotes a subset of the full data; see 
Sec. 8.3.1 of [5^ for full details. However even Eqn. 

still represents a substantial computational burden 
in general. To facilitate the illustration below, which 
investigates the sampling distribution of estimators. 


we took a random subset of n = 1, 000 training points 
and a subset of regressors approximation with n' = 
100. The total computational time needed to obtain 
these results was 268 core-hours. 

For each test point z^ the sampling standard deviation 
of Yi was estimated from 10 independent realisations 
of the QMC procedures. For CF we used a randomly- 
shifted, scrambled Halton sequence {a.^ = 1) and 
Wendland kernels with fc = 1 {au = 2), so that the¬ 
ory predicts an acceleration factor of The 

estimator standard deviations were estimated for all 
4,449 test points (with N = 2®) and the full results 
are shown in Fig. Note that each test point 2 ;* cor¬ 
responds to a different test function / and thus these 
results are quite objective, encompassing thousands of 
different integration problems. For the vast major¬ 
ity of integration problems, CF accelerated the stan¬ 
dard QMC estimator. Here the computational time 
to construct a functional approximation (inverting a 
16 X 16 matrix) was negligible (3%) in comparison 
to the cost of evaluating the function / once. The 
total additional computational time associated with 
the QMC-I-CF methodology was 2% greater than for 
QMC, which is easily justified by the substantial vari¬ 
ance reductions (~ 10®%) that are realised in this ap¬ 
plication. Supplementary results (Fig. S9) compare 
QMC-hCF to MC-hCF (standard MC sampling). 
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Figure 2: Application to robot arm data. Left: Posterior predictive means were computed for the mechanical 
torque experienced by one of the seven joints of the arm, for each of 4,449 joint configurations. Schematic 
reproduced from [32]. Centre: Model hyper-parameters were integrated out; for this task we compared standard 
QMC with the proposed QMC-fCF approach (both implementations provided unbiased estimators). Right: 
Examining the estimator sampling standard deviations, we see that, for all but a handful of the configurations, 
QMC-I-CF was more accurate than QMC. 


5 Discussion 

QMC methods are becoming increasingly relevant in 
modern statistics applications [12 [^ and it is surely 
a priority to target the rate constants governing the 
practical performance of these algorithms. CFs pro¬ 
vide one route to achieve this goal, providing substan¬ 
tial variance reductions in many of the examples we 
considered. Indeed, CFs allow us to use a sub-optimal 
QMC rule (e.g. as built into existing software pack¬ 
ages) and yet, with minimal additional coding, obtain 
a QMC-I-CF algorithm that attains optimal conver¬ 
gence rates. The focus on unknown smoothness a dis¬ 
tinguishes our work from previous literature on the 
connection between integration and functional approx¬ 
imation, e.g. mm- 

Functional approximation, and hence our QMC-I-CF 
methodology, has a computational cost associated 
with solution of a linear system. Whilst negligible in 
our experiments, this cost could be reduced if neces¬ 
sary using standard approximations and/or compactly 
supported kernels. On the other hand, we note that 
QMC is often used when / is expensive to evaluate 
and in such situations it is likely that evaluation of 
the integrand, rather than solution of a linear system, 
will be the main computational bottleneck. 

Our focus was on Sobolev spaces, but it is known that 
a faster rate 0{N~°‘+^) is possible in the subspace 
5'FI“([0, l]'^), for any e > 0, and explicit point sets 
are available (for integer a) 0. An immediate exten¬ 
sion is to establish optimal rates for CFs in this class 
of functions. In a related direction, one can in princi¬ 


ple obtain dimension-independent rates by imposing a 
(strong) assumption of polynomial tractability on the 
RKHS. This is achieved by generalising to weighted 
Sobolev spaces, such that the integrand / ‘depends 
only weakly on most of the components of a;’. Further 
details are provided in [T] [20| and form part of our 
ongoing research. 

The methods that we describe are immediately appli¬ 
cable in a range of applications including marginalisa¬ 
tion of hyper-parameters in classification uni, proba¬ 
bilistic inference for differential equations [281 [5| , com¬ 
putation of model evidence [21] and approximation of 
the partition function in social network models [26] . 
Finally we note that CFs generalise to other integra¬ 
tion methods including Bayesian Quadrature m E] 
and related kernel-based quadrature rules [2| , in which 
the worst case error is also controlled by an RKHS 
norm ||/||_tr; this will be the focus of our ongoing re¬ 
search. 
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Supplementary Text 

In this section we provide details for how to allocate computational resources between the sets u^'-^ and 
trading off integration error with functional approximation error. 

Proposition. The optimal scaling of M/N € (0,1), in the sense of asymptotically minimising the QMC+CF 
absolute error, is given by 

M a — ar 

TV ^ ^ ^ a ■ 

Proof. From Theoremwith u^'-^ quasi-uniform, the QMC-I-CF error is bounded above by 

\E[f-u^'-^,v^+^'-^]- I[f]\ < C/eff-z. (13) 

for some constant Cf € (0, oo). 

For Oi-optimal QMC we have that ch'^l = 0{{N — 

Suppose that M = for some m S N. Then since u^'-^ are quasi-uniform it follows (from considering a regular 
square lattice) that h{u^'^) = This gives the general scaling h{u^'^) = 

Writing M = cN for some c we obtain from Eqn. [^the objective function 

J(c) = (N - X (d-i/2(cfV)-^/^)“-“^ 

that we wish to minimise over c G [0,1). Solving for J'(c) = 0 completes the argument. □ 

Supplementary Experimental Results 

This section contains all simulated data results discussed in the paper. Specifically, for each of the 6 test 
functions described by we display results based on Wendland’s compactly supported regression kernel [34] 
with smoothness parameter k (described above) set equal to either 

• A: = 0, or 

• A: = 1, or 

• k = 2 

in combination with QMC design points generated from either 

• a Halton sequence, deterministically scrambled using the reverse radix algorithm [16j . and then applying a 
random shift or 

• a Sobol sequence, randomly scrambled using the algorithm of [IB] . 

We used the MATLAB implementation of m that is freely available (web address given in the Main Text). 
The QMC design points can be generated using the in-build MATLAB functions haltonset, sobolset and 
scramble. Full MATLAB code used to generate these results is provided in the Electronic Supplement. 
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Genz Function ^1-. Oscillatory Test Function 



Figure S3: Numerical results: Each panel represents one of the 6 QMC-I-CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Genz Function ^2-. Product Peak Test Function 



Figure S4: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Genz Function Corner Peak Test Function 



Figure S5: Numerical results: Each panel represents one of the 6 QMC-I-CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Genz Function ^4.-. Gaussian Test Function 



Figure S6: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Genz Function Continuous Test Function 



Figure S7: Numerical results: Each panel represents one of the 6 QMC-I-CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Genz Function Discontinuous Test Function 



Figure S8: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspond 
to standard QMC, dashed lines correspond to QMC+CF. O represents dimension d = 1, A represents d = 2, □ 
represents d = 3 and * represents d = 4. Experiments were replicated with 10 random seeds and error bars denote 
standard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence or 
a scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k. 
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Robot Arm Example: Additional Results 

We re-ran the robot arm simulation in order to compare the QMC-I-CF estimator with the MC-I-CF estimator; 
that is, a quasi-uniform set were used to construct a control functional Jm, whilst a Monte Carlo sample 
^M+i-.N integrate the difference / — /m- 



Figure S9: Application to robot arm data: Examining the estimator sampling standard deviations, we see that, 
for all but a handful of the configurations, QMC-I-CF was more accurate than MC-I-CF. 




